Add support for reading PDBs from AF2 database#102
Conversation
|
Hello @a-r-j! Thanks for updating this PR.
Comment last updated at 2022-05-12 01:49:06 UTC |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@rasbt Will add a note to the changelog but unsure how you want to proceed wrt naming convention. |
rasbt
left a comment
There was a problem hiding this comment.
This is really cool stuff! Thanks for this!
Btw. in addition to the other suggestions, one thing I couldn't annotate in the review because the files were empty, so I am commenting about this here: do we need the __init__.py files in the test folders? Just curious what they do? (I mean there is maybe an advantage, but could you maybe summarize it?)
|
I added the __init__s because tests would fail locally for me due to the overlap in the filenames for the mmcif and pdb testsuites. Adding the inits differentiates them in the namespace. |
|
Makes sense, thanks! |
biopandas/mmcif/pandas_mmcif.py
Outdated
|
|
||
| class PandasMmcif: | ||
| def __init__(self, use_auth: bool = True): | ||
| def __init__(self, use_auth: bool = True, af2_version: int = 2): |
There was a problem hiding this comment.
Would this be for specifying which alphafold database to use? I don't think we need this as an __init__ attribute though? Can't it be specified in fetch_mmcif(source="")?
E.g., something like
if source == 'alphafold2-v1':
af2_version = 1
elif source == 'alphafold2-v2':
af2_version = 2
else:
...There was a problem hiding this comment.
Yep, good call I think it belongs somewhere else. The param specifies which release of the database to retrieve the structure from. I don't anticipate it being changed much - the latest version is what I expect 99% of people would use - it was intended to reduce the maintenance burden in between new releases and updating the default.
biopandas/mmcif/pandas_mmcif.py
Outdated
| return self | ||
|
|
||
| def fetch_mmcif(self, pdb_code: Optional[str] = None, uniprot_id: Optional[str] = None, source: str = "pdb"): | ||
| def fetch_mmcif(self, pdb_code: Optional[str] = None, uniprot_id: Optional[str] = None, source: str = "pdb", af2_version: int = 2): |
There was a problem hiding this comment.
This would work, but i wonder if it wouldn't make more sense to make it part of the string, otherwise, we end up with a very specific argument that only makes sense if users select "alphafold2"?
There was a problem hiding this comment.
I think setting it in the string is still problematic as it would require an update to biopandas to support each new release of the database before it can be used
There was a problem hiding this comment.
Hm, I feel like this kind of complicated the user-interface too much. This is a value very specific to one particular "source" choice. Sure, we only have 2 source choices at the moment, but there could be more in the future (e.g., competitors to AF2, potentially AF3 etc.) and then it would become very messy for users to figure out which options go together.
I feel like it would be much simpler to support a fixed number of options for the sake of user-friendliness. If there is a v3 in the future, I don't think it would be much effort to update.
There was a problem hiding this comment.
Sure, I've made the changes :)
|
Looks good to me know! The documentation currently doesn't fully execute because the part on multiple models is not here yet, but I can take care of that when it is merged into main. So the only thing left is the Changelog note, unless you have anything else you would like to add? |
|
Nothing further from me, but I might pick up some loose threads from the open issues over the summer 😁 Thanks for all the help on the PRs - it's super fun working on this with you and I'm learning lots! |
|
Awesome! 🎉🙌. |
Description
Simple PR. Adds an equivalent method (
fetch_af2) tofetch_pdbto interact with structures in the AF DB via their UniProt IDRelated issues or pull requests
None
Pull Request Checklist
./docs/sources/CHANGELOG.mdfile (if applicable)./biopandas/*/testsdirectories (if applicable)biopandas/docs/sources/(if applicable)PYTHONPATH='.' pytest ./biopandas -svand make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)flake8 ./biopandas